A Restless Bandit Formulation of Multi-channel Opportunistic Access: Indexablity and Index Policy

نویسندگان

  • Keqin Liu
  • Qing Zhao
چکیده

We focus on an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process, for which a powerful index policy—Whittle’s index policy—can be implemented based on the indexability of the system. Exploiting the underlying structure of the multi-channel opportunistic access problem, we establish the indexability and obtain Whittle’s index in closed-form for both discounted reward and average reward criteria. These results lead to the direct implementation of Whittle’s index policy with remarkably low complexity. Furthermore, we develop a simple approach to evaluate the optimal performance under a relaxed constraint on sensing actions, which provides an upper bound of the optimal performance of the original restless multi-armed bandit process. The tightness of the upper bound and the near-optimal performance of Whittle’s index policy are illustrated with simulation examples. When channels are stochastically identical, we show that Whittle’s index policy is equivalent to the myopic policy, which has a simple and robust structure. Based on this structure, we establish approximation factors of the performance by Whittle’s index policy for stochastically identical channels. Index Terms Opportunistic access, dynamic channel selection, restless multi-armed bandit, Whittle’s index, indexability, myopic policy This work was supported by the Army Research Laboratory CTA on Communication and Networks under Grant DAAD19-012-0011 and by the National Science Foundation under Grants ECS-0622200 and CCF-0830685. Part of this work was presented at the 5th IEEE Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON) Workshops (June, 2008) and will be presented at IEEE Asilomar Conference on Signals, Systems, and Computers (October, 2008).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection

We consider the channel access problem in a multi-channel opportunistic communication system with imperfect channel sensing, where the state of each channel evolves as a non independent and identically distributed Markov process. This problem can be cast into a restless multi-armed bandit (RMAB) problem that is intractable for its exponential computation complexity. A natural alternative is to ...

متن کامل

Indexability of Restless Bandit Problems and Optimality of Index Policies for Dynamic Multichannel Access

We consider an opportunistic communication system consisting of multiple independent channels with time-varying states. With limited sensing, a user can only sense and access a subset of channels and accrue rewards determined by the states of the sensed channels. We formulate the problem of optimal sequential channel selection as a restless multi-armed bandit process. We establish the indexabil...

متن کامل

Multi-channel opportunistic access : a restless multi-armed bandit perspective. (Accès opportuniste dans les systèmes de communication multi-canaux : une perspective du problème de bandit-manchot)

Cognitive radio, first envisioned by Mitola, is the key enabling technology for future generations of wireless systems that addresses critical challenges in spectrum efficiency, interference management, and coexistence of heterogeneous networks. The core concept in cognitive radio networks is opportunistic spectrum access, whose objective is to solve the imbalance between spectrum scarcity and ...

متن کامل

Structure and Optimality of Myopic Policy in Opportunistic Access with Noisy Observations

A restless multi-armed bandit problem that arises in multichannel opportunistic communications is considered, where channels are modeled as independent and identical Gilbert-Elliot channels and channel state observations are subject to errors. A simple structure of the myopic policy is established under a certain condition on the false alarm probability of the channel state detector. It is show...

متن کامل

Lazy Restless Bandits for Decision Making with Limited Observation Capability: Applications in Wireless Networks

In this work we formulate the problem of restless multi-armed bandits with cumulative feedback and partially observable states. We call these bandits as lazy restless bandits (LRB) as they are slow in action and allow multiple system state transitions during every decision interval. Rewards for each action are state dependent. The states of arms are hidden from the decision maker. The goal of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0810.4658  شماره 

صفحات  -

تاریخ انتشار 2008